A New Memory MapReduce Framework for Higher Access to Resources
نویسندگان
چکیده
The demand for highly parallel data processing platform was growing due to an explosion in the number of massive-scale data applications both in academia and industry. MapReduce was one of the most meaningful solutions to deal with big data distributed computing. This paper was based on the work of Hadoop MapReduce. In the face of massive data computing and calculation process, MapReduce generated a lot of dynamic data, but these data were discarded after the task completed. Meanwhile, a large number of dynamic data were written to HDFS during task execution, caused much unnecessary IO cost. In this paper, we analyzed existing distributed caching mechanism and proposed a new Memory MapReduce framework that has a real-time response to read or write request from task nodes, maintain related information about cache data. After performance testing, we could clearly find MapReduce with cache significantly improved in IO performance.
منابع مشابه
MapReduce-Style Computation in Distributed Virtual Memory
Many cloud computing technologies, such as MapReduce, use file systems as the system-wide substrate for data handling. A distributed file system provides a global name space and stores data persistently, but it also introduces significant overhead. Several recent systems use DRAM to store data and tremendously improve the performance of cloud computing systems. However, both our own experience ...
متن کاملAn Enhanced Map Reduce Framework for Improving the Performance of Massively Scalable Private Clouds
Cloud Computing systems provide access to large amount of data and other resources through a large number of interfaces. Apache Hadoop is a framework that allows distributed processing of large sets of data across cluster of computers. It is a powerful abstraction proposed for making scalable and fault tolerant applications. In this paper we have suggested an enhanced framework for MapReduce wh...
متن کاملI/O Throttling and Coordination for MapReduce
As a leading framework for data intensive computing, MapReduce has gained enormous popularity in large-scale data analysis. With the increasing adoption of multi/many core platform, more and more MapReduce tasks are now running on the same node and sharing the same storage resources. The concurrency of tasks raises the issue of I/O stream congestion. We have observed significant throughput drop...
متن کاملA review of methods for resource allocation and operational framework in cloud computing
The issue of management and allocation of resources in cloud computing environments, according to the breadth of scale and modern technology implementation, is a complicated issue. Issues such as: the heterogeneity of resources, resource dependencies to each other, the dynamics of the environment, virtualization, workload diversity as well as a wide range of management objectives of cloud servi...
متن کاملEnhancing Map-Reduce Framework for Bigdata with Hierarchical Clustering
MapReduce is a software framework that allows certain kinds of parallelizable or distributable problems involving large data sets to be solved using computing clusters. This paper introduces our experience of grouping internet users by mining a huge volume of web access log of up to 500 gigabytes. The application is realized using hierarchical clustering algorithms with Map-Reduce, a parallel p...
متن کامل